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ABSTRACT 

The eukaryotic linear motif (ELM http://elm.eu.org) 
resource is a hub for collecting, classifying and 
curating information about short linear motifs 
(SLiMs). For >10 years, this resource has provided 
the scientific community with a freely accessible 
guide to the biology and function of linear motifs. 
The current version of ELM contains ~200 different 
motif classes with over 2400 experimentally validated 
instances manually curated from >2000 scientific 
publications. Furthermore, detailed information 
about motif-mediated interactions has been anno- 
tated and made available in standard exchange 
formats. Where appropriate, links are provided to re- 
sources such as switches.elm.eu.org and KEGG 
pathways. 

INTRODUCTION 

In recent years, our understanding of the nature of 
protein-protein interactions has changed dramatically. 
Intrinsically disordered protein regions (IDRs) have been 
established as key facilitators of protein functionality 



(1-4), and consequently, globular domains no longer 
prevail as the sole purveyors of protein function. Short 
linear motifs (SLiMs), a class of compact, degenerate 
and convergently evolvable interaction modules, are the 
predominant functional modules found in intrinsically dis- 
ordered regions (5-7). Interactions mediated by SLiMs, 
also referred to as linear motifs or MiniMotifs, have 
been shown to direct many diverse processes, such as 
controlling cell cycle progression, tagging proteins for 
proteasomal degradation, modulating the efficiency of 
translation, targeting proteins to specific sub-cellular 
localizations and stabilizing scaffolding complexes. 
Undoubtedly, more functions will be revealed in the 
future as additional SLiM instances are characterized. 

SLiMs are represented by a limited number of con- 
strained affinity- and specificity-determining residues 
within peptides that are typically between 3 and 11 
amino acids in length (5,7,8). The compactness of a 
SLiM results in low-affinity binding (typically in the low 
micromolar K,t range) (7,9-12), and consequently, SLiMs 
often mediate transient, dynamic and reversible inter- 
actions. As a result of the limited number of binding de- 
terminants in a short linear motif, novel SLiMs can readily 
evolve de novo, adding functionality to a protein. The ease 
of evolution of motifs has resulted in the proliferation of 
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SLiMs that encode functions of broad utility and many 
motif classes are ubiquitous, occurring in tens or hundreds 
of proteins. Many pathogens have also taken advantage 
of the intrinsic evolutionary plasticity of SLiMs by 
mimicking host motifs to deregulate and repurpose host 
pathways (13,14). 

On a higher regulatory level, short linear motifs often 
exhibit complex switching behavior by co-operating with 
each other and with post-translational modifications to 
facilitate switching between different functional states of 
a protein, and thus, SLiMs function as key regulatory 
modules that allow for context-dependent, integrative 
regulatory decision-making (15-17). 

THE EUKARYOTIC LINEAR MOTIF (ELM) 
RESOURCE 

The ELM resource was established in 2003 with the 
mission to collect, annotate, classify and detect short 
linear motifs (18). It consists of two main modules: A 
relational database that stores all annotations and a pre- 
diction tool that uses the stored data to detect instances of 
short linear motifs in query sequences submitted by the 
user. The annotated data are manually curated from 
literature and made freely available to the scientific 
community. 

At its core, the ELM database consists of ELM classes 
(grouped hierarchically into ELM types, see below) and 
ELM instances: An ELM class describes the specificity of 
a peptide-binding domain or domain family, usually 
summarized in the syntax of regular expressions. For 
example, the ELM class DEG_SCF_TRCP1_1 describes 
the pattern D(S)G.{2,3}([ST\), whereby the first three 
amino acid positions (D,S,G) are fixed positions, 
followed by a flexible gap of either two or three amino 
acids of any type, followed by either an S or T residue. 
In addition, the round brackets around the second and 
last position indicate that these positions have to be 
phosphorylated to have a fully functional motif. The 
website for this ELM class (http://elm.eu.org/elms/ 
elmPages/DEG SCF TRCPl l .html) summarizes current 
literature on the motif, providing information about bio- 
logical context, taxonomic distribution, a set of represen- 
tative class instances, interacting protein domain(s), as well 
as links to primary literature and additional resources. 

Most ELM classes have at least one ELM instance 
annotated, whereby an instance describes the 



experimentally determined match of a regular expression 
of an ELM class in a protein sequence. During annotation 
of instances, focus is put on the curation of experimental 
methods. Well-validated instances have been shown by 
more than one method in more than one publication and 
ideally include structural data and interaction information 
in addition to cell assays. Transient over-expression experi- 
ments on their own are not very trustworthy and have led 
to many false positive motifs being reported (19). 

The ELM resource has been updated significantly since 
the last release (20), with 26 new ELM classes and 689 new 
ELM instances, raising the total count to 197 ELM classes 
and 2404 ELM instances (Tables 1 and 2). 

More updates and changes are described in the follow- 
ing sections. 

NEW TYPES FOR ELM CLASSES 

ELM classes were originally categorized into four differ- 
ent types based on the function of the motif: Proteolytic 
cleavage sites (CLV), general ligand binding sites (LIG), 
sites for post-translational modification (MOD) and 
sub-cellular targeting sites (TRG) [see Table 1 in Gould 
et al. (21)]. 

The annotation of many additional ELM classes made 
it both possible and necessary to introduce novel ELM 
types to categorize motif classes in more detail. Ligand 
binding classes describing docking sites or destruction 
motifs have been grouped together as two new types, 
DOC and DEG, respectively, raising the number of indi- 
vidual ELM types to six. Docking motifs (DOC) can be 
described as motifs that recruit a modifying enzyme using 
a site that is distinct from the active site (22), whereas a 
degron motif (DEG) is a specific region of a protein 
sequence that directs protein polyubiquitylation and 
targets the protein to the proteasome for degradation 
(23). Technically, all docking sites and destruction 
motifs belong to the 'ligand binding sites (LIG)' type; 
however, grouping together motif classes of similar 
function adds an additional level of discrimination. 



NEW FEATURES 

Interactions 

For all ELM classes, the corresponding interacting 
domain that recognizes the particular short linear motif 



Table 1. Summary of data stored in the ELM database" 



Functional sites 


ELM classes 


ELM instances 


PDB structures 


GO terms 




PubMed Links 




Total 


127 


197 




2404 


290 




419 




2132 


By category 


LIG 


103 


Human 


1391 














MOD 


30 


Mouse 


211 




Biological process 


217 


From ELM class 


976 




TRG 


23 


Rat 


115 














DEG 


15 


Yeast 


86 




Cell compartment 


95 


From instance 


1558 




DOC 


15 


Fly 


77 














CLV 


1 1 


Other 


524 




Molecular function 


107 







'As of September 2013. 
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Table 2. Novel ELM classes and number of associated instances (middle column) that have been added since the last ELM release" 



Identifier 



Numbers Description 



CLV_C14_Caspase3-7 39 

CLV_Separin_Fungi 4 

CLV_Separin_Metazoa 5 

DEGAPCCTPR1 22 

DEG_CRL4_Cdt2_l 6 

DEG_CRL4_Cdt2_2 1 

DEG_SCF_COIl_l 9 

DEG_SCF_Skp2-Cksl_l 3 

DEG_SCF_TIR1_1 24 

LIG_APCC_Cbox_l 3 

LIG_APCC_Cbox_2 2 

LIG_CAP-Gly_2 1 

LIG_EABR_CEP55_1 6 

LIGMYND2 3 

LIGMYND3 2 

LIGNBoxRRMl 2 

LIG_OCRL_FandH_l 3 

LIGPAM22 4 

LIG_SPRY_1 2 

LIGSUMOSBMl 39 

LIG_SUMO_SBM_2 8 

MOD_LATS_l 23 

MODNEK21 3 

MODNEK22 0 

TRGPEX2 2 

TRG PEX 3 1 



Caspase3 and Caspase7 cleavage site. 

Separase cleavage site, best known in sister chromatid separation. 

This short C-terminal motif is present in co-activators, the Docl/APCIO subunit and some substrates 
of the APC/C and mediates direct binding to TPR-containing APC/C core subunits. 

This degron overlaps a PCNA interaction protein (PIP) box and is recognised by the CRL4_Cdt2 
ubiquitin ligase in a PCNA- and chromatin-dependent manner. 

Degron motif binding to the COI1 F-Box protein of the SCF E3 ubiquitin ligase in a jasmonate- 
dependent manner. 

This phosphodegron uniquely requires a pre-assembled target recognition site composed of Skp2 and 
Cksl. 

Degron motif present in Aux/IAA transcriptional repressor proteins binding to TIR1/AFB F-box 

proteins of the SCF E3 ubiquitin ligase in an auxin-dependent manner. 
Motif in APC/C co-activators that mediates binding to the APC/C core. 

Short, partly aromatic carboxy terminal sequence found in the SLAIN group of microtubule- 
associated-proteins. 

This proline-rich motif binds to the EABR domain of Cep55 and is involved in both cytokinesis of 
somatic cells and intercellular bridge formation in differentiating germ cells. 

Motif that mediates the interaction between MYND domain of AML1/ETO and co-repressors 
SMRT and N-CoR. 

A variant MYND binding motif found in the HSP90 co-chaperones p23 and FKBP38 interacting 

with PHD2 MYND domain. 
Amino terminal region on Far Upstream Element (FUSE) binding protein, which mediates the 

interaction with FIR in order to recruit FIR to FUSE DNA. 
The F and H motif describes a 10-13-mer peptide sequence determined by a highly conserved 

phenylalanine and histidine residue surrounded by hydrophobic amino acids. A complex of ASH 

and RhoGAP-like domain binds the F and H motif within a hydrophobic pocket. 
Peptide ligand motif that directly binds to the MLLE/PABC domain found in poly(A)-binding 

proteins and HYD E3 ubiquitin ligases, mainly via a common central core region and a 

complementary C-terminal region. 
Peptide motif binding to the members of the SSB (or SPSB) family (SPRY domain- and SOCS 

box-containing protein). 
Motif that mediates binding to SUMO proteins non-covalently. 

Inverted version of LIG_SUMO_SBM_l that mediates binding to SUMO proteins non-covalently. 
The LATS phosphorylation motif is recognised by the LATS kinases for Ser/Thr phosphorylation. 

Substrates are often found toward the end of the Hippo signaling pathway. 
NEK2 phosphorylation motif; NEK protein kinases play a critical role in cell cycle control, 

interacting with and phosphorylating several centrosomal proteins. 

Motifs present in peroxisomal import receptors important for binding to peroxisomal membrane 
proteins (PMPs) or other peroxisomal import receptors. 



a As of September 2013. 



(SLiM) has been annotated (24). In addition, links have 
been provided to Pfam (25) or SMART (26), where more 
detailed information about the respective domain can be 
found. Where possible, the community annotation feature 
of Pfam has been used to curate each interaction domain 
present in ELM as an 'external link' in Pfam/Wikipedia to 
allow the user to easily jump between these resources. 

Furthermore, for >700 ELM instances, the interacting 
protein has been annotated and, if possible, the position of 
the interacting domain as well as the affinity of the inter- 
action has been curated. This information is presented in 
the ELM instance detail page (see Figure 1) and can be 
downloaded in either PSI-MI TAB or PSI-MI XML 2.5 
format (16,27) (see links section on the ELM website). 

ELMs involved in molecular switches 

As key regulatory interaction modules, linear motifs are 
tightly controlled and many motifs are conditionally 



turned 'on' and 'off depending on cell state. Pre-transla- 
tional addition or removal of a SLiM-containing exon, 
post-translational modification of the SLiM-containing 
peptide, allosteric SLiM inhibition or activation and 
SLiM binding site competition are amongst the most 
common mechanisms to regulate linear motifs. The 
switches. ELM database (15) is a resource dedicated to 
the curation and representation of experimentally 
validated motif-based molecular switches. It provides a 
categorized repository of >700 manually curated, experi- 
mentally validated instances of SLiM-based switch mech- 
anisms collected from literature. 

Each ELM instance that is part of a switching mechan- 
ism annotated by the switches. ELM resource has add- 
itional information shown on the ELM instance detail 
page as indicated in Figure 1: A short description of the 
switching mechanism is displayed with links to all partici- 
pants as well as an illustrative scheme of the switching 
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LIG_PTB_Phospho_1 



■ Instance 



Sequence 


Start 


End 


Subsequence 


Logic 


PDB 


Organism 


Length 












2L1C 






(P05106) ITB3_HUMAN 


767 


773 


ARAKWDTANNPLYKEATSTF 


TP 




8 Homo sapiens (Human) 


788 



















■ Instance evidence 



Evidence 
class 


PSMI 


Method 


BioSource 


PubMed 


Logic 


Reliability 


Notes 


experimental 


Ml:0077 


nuclear magnetic 
resonance 


in vitro 


Oxley,2008 


support 


certain 


InteractionDetection 
FeatureDetection 


experimental 


Ml:0077 


nuclear magnetic 
resonance 


in vitro 


Deshmukh,2010 


support 


certain 


InteractionDetection 
FeatureDetection 


experimental 


Ml:0005 


alanine scanning 


in vivo/in vitro/in 
silico 


Calderwood,2003 


support 


certain 


FeatureDetection 


experimental 


Ml:0107 


surface plasmon 
resonance 


in vitro 


Calderwood,2003 


support 


certain 


InteractionDetection 
FeatureDetection 



■ Interactions 



Uniprot Id 


PFAM protein 


PFAM family 


Domain Start 


Domain End 


Affinity Min/Max (uMol) 


Notes 


TLNl.HUMAN 


IQ9Y490 


IPF02174 (IRS) 
PTB domain (IRS-1 type) 


310 


401 


0.76/0.76 


B 


DOKl.HUMAN 


JQ99704 


JPF02174 (IRS) 
PTB domain (IRS-1 type) 


151 


2S4 


37.0/37.0 


ioetci 
R33D1 


SHCl.HUMAN 


IP293S3 


JPF00640 (PID) 
Phosphotyrosine interaction domain (PTB/PID) 


162 


318 







This ELM instance is part of the following switching mechanism(s) annotated at the ^switches. ELM resource: 

■ iSWTI000277: 

TLN1 OOK1 




ITGB3 ITGB3 

Phosphorylation of Y773 in Integrin bcta-3 (ITGB3) switches the specificity of ITGB3 from Talin-1 (TLN1) to Docking protein 1 (DOK1), with a 
2-fold decrease of the affinity for TLN1 and close to a 400-fold increase of the affinity for DOK1. This switch results in negative regulation of integrin 
activation. 



iSWTI000139: 

DOK1 



pho_l ^ ^ l:g_pt3_.. . P ho_l 



Phosphorylation of Y773 in the PTB-binding motif of Integrin bcta-3 (ITGB3) induces binding to the Docking protein 1 (DOK1) protein. 



Figure 1. Screenshot of the ELM website showing details for an instance of the ELM class LIG_PTB_Phospho_l in the human protein Integrin 
beta-3 at position 767-773. Details about the instance are depicted on top, including a representation of the 3D structure PDB:2LIC showing the 
instance bound by 'SHC-transforming protein 1'. Below the instance evidence, which holds details about the methods used in the article, information 
regarding the interaction between the linear motif and the domain can be found. Here, three interaction partners containing phosphotyrosine-binding 
domains (PTB) are annotated: 'talin-1', 'docking protein I' and 'SHC-transforming protein 1'. Finally, the two schematics at the bottom illustrate the 
involvement of this motif instance in molecular switching mechanisms. 



Nucleic Acids Research, 2014, Vol. 42, Database issue D263 



type. More details can be found at the switches. ELM page 
(http://switches.elm.eu.org) by clicking on the illustration. 



MOTIFS-COMPLEXES-NETWORKS 

Motif-mediated interactions play an important role in 
complex formation as exemplified by the multi-protein 
complex 'anaphase promoting complex or cyclosome' 
(APC/C) E3 ubiquitin ligase. This complex is represented 
by the novel ELM classes LIG_APCC_Cbox_l and 
DEG APCC TPR1 complementing the existing entries 
DEG_APCC_DBOX_l and DEG_APCC_KENBOX_2. 
APC/C controls the progression through the cell cycle 
by ubiquitylation of cell cycle regulators and consists of 
at least 13 subunits assembled into the major subunit and 
two subcomplexes: One subcomplex consists mainly of 
tetratricopeptide repeat (TPR) domain-containing 
proteins (28), while the other subcomplex includes the 
catalytic core with the cullin domain-containing subunit 
Apc2 and the RING domain-containing subunit Ape 11 
(29). APC/C can ubiquitylate substrates only in the 
presence of the WD40 repeat-containing co-activator 
proteins Cdc20 or Cdhl, which are active at distinct 
phases of the cell cycle. Binding of Cdhl to the APC/C 
is mediated by at least two motifs (Figure 2), a C-Box 
possibly binding to the catalytic Apc2 subunit (31) and a 
C-terminal IR-tail motif (32) (DEG_APCC_TPR_1) 
binding to the tetratricopeptide repeat (TPR) region of 
one subunit of the Cdc27 dimer and possibly additional 
uncharacterized interfaces. 

Destruction of substrate proteins is facilitated by inter- 
actions with the bound co-activators, also mediated via 
short linear motifs (called degradation motifs or 
degrons), such as the well-characterized D-Box (33) and 
KEN-Box motifs (34,35). Strikingly, Cdc20 itself contains 
a KEN-Box, which is therefore recognized by Cdhl, 
ensuring the temporal degradation of Cdc20 (36). 

Several pseudosubstrates for APC/C have been 
identified, which — while being able to bind to the APC/ 
C complex — are not ubiquitylated and thus function as 
inhibitors of the APC/C complex (37). One example is 
illustrated in Figure 2: The yeast ACM1 protein was 
identified as a stable binding partner of Cdhl and an in- 
hibitor of APC/C-Cdhl activity, containing three motifs 
mediating binding to Cdhl (38). 

Taken together, these motif-mediated interactions of 
the APC/C complex ensure the timely turnover of 
numerous cell-cycle regulatory proteins, thus governing 
eukaryotic cell cycle progression (39). We consider APC/ 
C to be an excellent paradigm for the domain-motif inter- 
play that pervades cell regulation. 

With the wealth of information available in the 
ELM compendium, it is also possible to map these 
data onto higher-level information systems such as the 
KEGG resource (40). Figure 3 illustrates the human 
phosphatidylinositol-3'-kinase-(PI3K)-Akt signaling 
pathway taken from KEGG (pathway-id: hsa04151). All 
interactions that are motif-mediated and annotated in the 
ELM resource have been mapped onto this pathway, with 
colours indicating the type of ELM class. Although it is 



APC/C complex 




Figure 2. Motif-mediated regulation of APC/C function. Structure of 
the yeast APC/C complex [EMD-1815, determined by Cryo-EM. 
Figure generated with chimera (30)] with confirmed (full arrows) or 
putative (dashed arrows) motif-binding sites indicated. Binding of 
the co-activator Cdhl (blue) to the APC/C is mediated by two 
motifs: The C-terminal IR motif binds to the tetratricopeptide repeat 
(TPR) region of one subunit of the Cdc27 dimer (green) and the C-Box 
binds to the catalytic Apc2 subunit (yellow). The ApclO (orange) 
subunit also contains a C-terminal IR motif, which binds to the TPR 
domain of the second Cdc27 subunit (green). Recruitment of substrates 
or additional regulators such as the pseudo-substrate ACM1 
(PDB:4BH6) also depends on motifs. The A Motif, KEN-Box and 
D-Box bind to different sites on the WD40 domain of Cdhl. In 
addition, the D-Box also contacts ApclO, which functions as a co- 
receptor for this degron (31). 



likely that not all motif-mediated interactions are shown 
(simply because of the fact that they have not yet been 
annotated in ELM), the sheer amount of motif involve- 
ment is compelling. Mappings of annotated motifs onto 
other resources as shown in Figure 3 for a KEGG 
pathway can broaden our understanding of the import- 
ance of short linear motifs by allowing us to investigate 
motifs on a broader scale and to inspect their role in 
different regulatory pathways. 

Webservices update 

To allow users programmatic access to query the ELM 
server, SOAP/XML web services had previously been im- 
plemented (21). This system has been updated to a REST 
service model (41), whereby the communication uses the 
HTTP protocol/features to deliver structured data. Users 
now no longer need a special client to talk to the server but 
instead can use any browser or text-mode client (such as 
'curl' or 'wget'). Libraries to request and process HTTP 
queries are an integral part of most programming 
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PI3K-Akt signaling pathway 



Hypoxia > | REDD1 1 

AMP 



mTOR signaling 



ELM types highlighted in color: 
I PEG I I DOC I | LIG I MOD | 



Chemokine 
signaling 

Chemokines, 
Hormones, 
Neurotransmitters 




Cell cycle 



Cell survival p53 signaling 



Figure 3. Motif-mediated interactions annotated in the ELM resource mapped onto the KEGG (40) human Phosphatidylinositol-3'-kinase-(PI3K)- 
Akt signaling pathway (hsa04151). The direction of arrows denotes pathway direction. A colored border around a protein name indicates a motif 
within this protein is responsible for mediating the interaction to another protein in this pathway, also highlighted by a colored edge — docking motifs 
(blue), degrons (green), ligand binding motifs (orange) and modification sites (red). Colored dotted lines represent motif-mediated interactions 
mapped by homology. Phosphorylation/dephosphorylation events are indicated as '+p'/'— p' next to a node, respectively. 



languages, consequently programmatically retrieving and 
utilizing the information provided by the ELM server 
should be straightforward for bioinformaticians of any 
skill level. A detailed list of URLs that can be used to 
query ELM via REST can be found at the 'downloads' 
page (http://elm.eu.org/downloads.html). We encourage 
users to send us feedback on the new methods as well as 
suggestions for possible future features. 

PERSPECTIVE 

Over the 10 years of ELM availability, the ELM resource 
has proven to be a valuable source of information for 
bench biologists (42^46) as well as for bioinformaticians 
performing computational analyses (47-51). It is now 
very clear that linear motifs are central to the understand- 
ing of cell regulation. Their co-operative interactions are 
to be found in all regulatory protein complexes of the 
cell. It does remain computationally challenging to 
discover new motifs by in silico methods, although 
progress is being made. Experimentalists continue to 
report new motif discoveries. Because there seems to be 
no drop in the rate of motif discovery, it can be 
extrapolated that the ~200 motif classes listed in ELM 



is surely well short of the true number in eukaryotic 
proteomes. We shall need to keep on counting for the 
next 10 years and beyond. 
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